In Bayesian statistics, the posterior probability of a random event or an uncertain proposition is the conditional probability that is assigned after the relevant evidence is taken into account. Similarly, the posterior probability distribution is the distribution of an unknown quantity, treated as a random variable, conditional on the evidence obtained from an experiment or survey.
Contents |
Let us have an a priori belief that the probability distribution function is and an observation with the likelihood , then the posterior probability is defined as
The posterior probability can be written in the memorable form as
Suppose there is a mixed school having 60% boys and 40% girls as students. The girl students wear trousers or skirts in equal numbers; the boys all wear trousers. An observer sees a (random) student from a distance; all the observer can see is that this student is wearing trousers. What is the probability this student is a girl? The correct answer can be computed using Bayes' theorem.
The event A is that the student observed is a girl, and the event B is that the student observed is wearing trousers. To compute P(A|B), we first need to know:
Given all this information, the probability of the observer having spotted a girl given that the observed student is wearing trousers can be computed by substituting these values in the formula:
The posterior probability distribution of one random variable given the value of another can be calculated with Bayes' theorem by multiplying the prior probability distribution by the likelihood function, and then dividing by the normalizing constant, as follows:
gives the posterior probability density function for a random variable X given the data Y = y, where
In classification posterior probabilities reflect the uncertainty of assessing an observation to particular class, see also Class membership probabilities. While Statistical classification methods by definition generate posterior probabilities, Machine Learners usually supply membership values which do not induce any probabilistic confidence. It is desirable, to transform or re-scale membership values to class membership probabilities, since they are comparable and additionally easier applicable for post-processing.